Preamble



In [3]:

    
import numpy as np

from scipy.spatial.distance import cdist
from scipy.special import expit

from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import make_pipeline, make_union
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.datasets import load_diabetes

Feature Union with Heterogeneous Data Sources

Polynomial basis function

The polynomial basis function is provided by scikit-learn in the sklearn.preprocessing module.



In [4]:

    
X = np.arange(1, 9).reshape(4, 2)
X









    Out[4]:





array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])



In [5]:

    
PolynomialFeatures(degree=2).fit_transform(X)









    Out[5]:





array([[ 1,  1,  2,  1,  2,  4],
       [ 1,  3,  4,  9, 12, 16],
       [ 1,  5,  6, 25, 30, 36],
       [ 1,  7,  8, 49, 56, 64]])

Custom basis functions

Unfortunately, this is pretty much the extent of what scikit-learn provides in the way of basis functions. Here we define some standard basis functions, while adhering to the scikit-learn interface. This will be important when we try to incorporate our basis functions in pipelines and feature unions later on. While this is not strictly required, it will certainly make life easier for us down the road.

Radial Basis Function



In [53]:

    
class RadialFeatures(BaseEstimator, TransformerMixin):
    
    def __init__(self, mu=0, s=1):
        self.mu = mu
        self.s = s
        
    def fit(self, X, y=None):
        # this basis function stateless
        # need only return self
        return self
        
    def transform(self, X, y=None):
        return np.exp(-cdist(X, self.mu, 'sqeuclidean')/(2*self.s**2))

Sigmoidal Basis Function



In [54]:

    
class SigmoidalFeatures(BaseEstimator, TransformerMixin):
    
    def __init__(self, mu=0, s=1):
        self.mu = mu
        self.s = s
        
    def fit(self, X, y=None):
        # this basis function stateless
        # need only return self
        return self
        
    def transform(self, X, y=None):
        return expit(cdist(X, self.mu)/self.s)



In [8]:

    
mu = np.linspace(0.1, 1, 10).reshape(5, 2)
mu









    Out[8]:





array([[ 0.1,  0.2],
       [ 0.3,  0.4],
       [ 0.5,  0.6],
       [ 0.7,  0.8],
       [ 0.9,  1. ]])



In [9]:

    
RadialFeatures(mu=mu).fit_transform(X).round(2)









    Out[9]:





array([[ 0.13,  0.22,  0.33,  0.47,  0.6 ],
       [ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ],
       [ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ],
       [ 0.  ,  0.  ,  0.  ,  0.  ,  0.  ]])



In [10]:

    
SigmoidalFeatures(mu=mu).fit_transform(X).round(2)









    Out[10]:





array([[ 0.88,  0.85,  0.82,  0.78,  0.73],
       [ 0.99,  0.99,  0.99,  0.98,  0.97],
       [ 1.  ,  1.  ,  1.  ,  1.  ,  1.  ],
       [ 1.  ,  1.  ,  1.  ,  1.  ,  1.  ]])

Real-world Dataset

Now that we have a few basis functions at our disposal, let's try to apply different basis functions to different features of a dataset. We use the diabetes dataset, a real-world dataset with 442 instances and 10 features. We first work through each step manually, and show how the steps can be combined using scikit-learn's feature unions and pipelines to form a single model that will perform all the necessary steps in one fell swoop.



In [11]:

    
diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target



In [12]:

    
X.shape









    Out[12]:





(442, 10)



In [13]:

    
y.shape









    Out[13]:





(442,)

We print every other feature for just the first few instances, just to get an idea of what the data looks like



In [14]:

    
# sanity check
X[:5, ::2]









    Out[14]:





array([[ 0.03807591,  0.06169621, -0.0442235 , -0.04340085,  0.01990842],
       [-0.00188202, -0.05147406, -0.00844872,  0.07441156, -0.06832974],
       [ 0.08529891,  0.04445121, -0.04559945, -0.03235593,  0.00286377],
       [-0.08906294, -0.01159501,  0.01219057, -0.03603757,  0.02269202],
       [ 0.00538306, -0.03638469,  0.00393485,  0.00814208, -0.03199144]])



In [15]:

    
# sanity check
y[:5]









    Out[15]:





array([ 151.,   75.,  141.,  206.,  135.])

Assume for some reason we are interested in training a model using, say, features 2 and 5 with a polynomial basis, and features 6, 8 and 9 with a radial basis. We first slice up our original dataset.



In [16]:

    
X1 = X[:, np.array([2, 5])]
X1.shape









    Out[16]:





(442, 2)



In [17]:

    
# sanity check
X1[:5]









    Out[17]:





array([[ 0.06169621, -0.03482076],
       [-0.05147406, -0.01916334],
       [ 0.04445121, -0.03419447],
       [-0.01159501,  0.02499059],
       [-0.03638469,  0.01559614]])



In [18]:

    
X2 = X[:, np.array([6, 8, 9])]
X2.shape









    Out[18]:





(442, 3)



In [19]:

    
# sanity check
X2[:5]









    Out[19]:





array([[-0.04340085,  0.01990842, -0.01764613],
       [ 0.07441156, -0.06832974, -0.09220405],
       [-0.03235593,  0.00286377, -0.02593034],
       [-0.03603757,  0.02269202, -0.00936191],
       [ 0.00814208, -0.03199144, -0.04664087]])

Now we apply the respective basis functions.

Polynomial



In [20]:

    
X1_poly = PolynomialFeatures().fit_transform(X1)
X1_poly.shape









    Out[20]:





(442, 6)



In [21]:

    
# sanity check
X1_poly[:5].round(2)









    Out[21]:





array([[ 1.  ,  0.06, -0.03,  0.  , -0.  ,  0.  ],
       [ 1.  , -0.05, -0.02,  0.  ,  0.  ,  0.  ],
       [ 1.  ,  0.04, -0.03,  0.  , -0.  ,  0.  ],
       [ 1.  , -0.01,  0.02,  0.  , -0.  ,  0.  ],
       [ 1.  , -0.04,  0.02,  0.  , -0.  ,  0.  ]])

Radial



In [22]:

    
mu = np.linspace(0, 1, 6).reshape(2, 3)
mu









    Out[22]:





array([[ 0. ,  0.2,  0.4],
       [ 0.6,  0.8,  1. ]])



In [23]:

    
X2_radial = RadialFeatures(mu).fit_transform(X2)
X2_radial.shape









    Out[23]:





(442, 2)



In [24]:

    
# sanity check
X2_radial[:5].round(2)









    Out[24]:





array([[ 0.9 ,  0.36],
       [ 0.85,  0.33],
       [ 0.9 ,  0.35],
       [ 0.9 ,  0.36],
       [ 0.88,  0.34]])

Now we're ready to concatenate these augmented datasets.



In [25]:

    
X_concat = np.hstack((X1_poly, X2_radial))
X_concat.shape









    Out[25]:





(442, 8)



In [26]:

    
# sanity check
X_concat[:5, ::2].round(2)









    Out[26]:





array([[ 1.  , -0.03, -0.  ,  0.9 ],
       [ 1.  , -0.02,  0.  ,  0.85],
       [ 1.  , -0.03, -0.  ,  0.9 ],
       [ 1.  ,  0.02, -0.  ,  0.9 ],
       [ 1.  ,  0.02, -0.  ,  0.88]])

Now we are ready to train a regressor with this augmented dataset. For this example, we'll simply use a linear regression model.



In [27]:

    
model = LinearRegression()
model.fit(X_concat, y)









    Out[27]:





LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)



In [28]:

    
model.score(X_concat, y)









    Out[28]:





0.41693404706575754

(To no one's surprise, our model performs quite poorly, since zero effort was made to identify and incorporate the most informative features or appropriate basis functions. Rather, they were chosen solely to maximize clarity of exposition.)

Recap

So let's recap what we've done.

We started out with a dataset with 442 samples and 10 features, represented by 442x10 matrix X
For one reason or another, we wanted to use different basis functions for different subsets of features. Apparently, we wanted features 2 and 5 for one basis function and features 6, 8 and 9 for another. Therefore, we
1. sliced the matrix X to obtain 442 by 2 matrix X1 and
2. sliced the matrix X to obtain 442 by 3 matrix X2.
We
1. applied a polynomial basis function of degree 2 to X1 with 2 features and 442 samples. This returns a dataset X1_poly with $\begin{pmatrix} 4 \\ 2 \end{pmatrix} = 6$ features and 442 samples. (NB: In general, the number of output features for a polynomial basis function of degree $d$ on $n$ features is the number of multisets of cardinality $d$, with elements taken from a finite set of cardinality $n+1$, which is given by the multiset coefficient $\begin{pmatrix} \begin{pmatrix} n + 1 \\ d \end{pmatrix} \end{pmatrix} = \begin{pmatrix} n + d \\ d \end{pmatrix}$.) So from 442 by 2 matrix X1 we obtain 442 by 6 matrix X1_poly
2. applied a radial basis function with 2 mean vectors $\mu_1 = \begin{pmatrix} 0 & 0.2 & 0.4 \end{pmatrix}^T$ and $\mu_2 = \begin{pmatrix} 0.6 & 0.8 & 1.0 \end{pmatrix}^T$, which is represented by the 2 by 3 matrix mu. From the 442 by 3 matrix X2, we obtain 442 by 2 matrix X2_radial
Next, we horizontally concatenated 442 by 6 matrix X1_poly with 442 by 2 matrix X2_radial to obtain the final 442 by 8 matrix X_concat
Finally, we fitted a linear model on X_concat.

So this is how we went from a 442x10 matrix X to the 442x8 matrix X_concat.

With Pipeline and Feature Union

First we define a transformer that slices up the input data. Note instead of working with (tuples of) slice objects, it is usually more convenient to use the Numpy function np.index_exp. We explain later why this is necessary.



In [47]:

    
class ArraySlicer(BaseEstimator, TransformerMixin):
    
    def __init__(self, index_exp):
        self.index_exp = index_exp
        
    def fit(self, X, y=None):
        return self
    
    def transform(self, X, y=None):
        return X[self.index_exp]



In [48]:

    
model = \
make_pipeline(
    make_union(
        make_pipeline(
            ArraySlicer(np.index_exp[:, np.array([2, 5])]),
            PolynomialFeatures()
        ),
        make_pipeline(
            ArraySlicer(np.index_exp[:, np.array([6, 8, 9])]),
            RadialFeatures(mu)
        )
    )
)



In [49]:

    
model.fit(X)









    Out[49]:





Pipeline(steps=[('featureunion', FeatureUnion(n_jobs=1,
       transformer_list=[('pipeline-1', Pipeline(steps=[('arrayslicer', ArraySlicer(index_exp=(slice(None, None, None), array([2, 5])))), ('polynomialfeatures', PolynomialFeatures(degree=2, include_bias=True, interaction_only=False))])), ('pipeline-2', ...rray([[ 0. ,  0.2,  0.4],
       [ 0.6,  0.8,  1. ]]), s=1))]))],
       transformer_weights=None))])



In [50]:

    
model.transform(X).shape









    Out[50]:





(442, 8)



In [51]:

    
# sanity check
model.transform(X)[:5, ::2].round(2)









    Out[51]:





array([[ 1.  , -0.03, -0.  ,  0.9 ],
       [ 1.  , -0.02,  0.  ,  0.85],
       [ 1.  , -0.03, -0.  ,  0.9 ],
       [ 1.  ,  0.02, -0.  ,  0.9 ],
       [ 1.  ,  0.02, -0.  ,  0.88]])

This effectively composes each of the steps we had to manually perform and amalgamated it into a single transformer. We can even append a regressor at the end to make it a complete estimator/predictor.



In [34]:

    
model = \
make_pipeline(
    make_union(
        make_pipeline(
            ArraySlicer(np.index_exp[:, np.array([2, 5])]),
            PolynomialFeatures()
        ),
        make_pipeline(
            ArraySlicer(np.index_exp[:, np.array([6, 8, 9])]),
            RadialFeatures(mu)
        )
    ),
    LinearRegression()
)



In [35]:

    
model.fit(X, y)









    Out[35]:





Pipeline(steps=[('featureunion', FeatureUnion(n_jobs=1,
       transformer_list=[('pipeline-1', Pipeline(steps=[('arrayslicer', ArraySlicer(index_exp=(slice(None, None, None), array([2, 5])))), ('polynomialfeatures', PolynomialFeatures(degree=2, include_bias=True, interaction_only=False))])), ('pipeline-2', ... ('linearregression', LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False))])



In [36]:

    
model.score(X, y)









    Out[36]:





0.41693404706575754

Breaking it Down

The most important thing to note is that everything in scikit-learn is either a transformer or a predictor, and are almost always an estimator. An estimator is simply a class that implements the fit method, while a transfromer and predictor implements a, well, transform and predict method respectively. From this simple interface, we get a surprising hight amount of functionality and flexibility.

Pipeline

A pipeline behaves as a transformer or a predictor depending on what the last step of the pipleline is. If the last step is a transformer, the entire pipeline is a transformer and one can call fit, transform or fit_transform like an ordinary transformer. The same is true if the last step is a predictor. Essentially, all it does is chain the fit_transform calls of every transformer in the pipeline. If we think of ordinary transformers like functions, pipelines can be thought of as a higher-order function that simply composes an arbitary number of functions.



In [55]:

    
model = \
make_pipeline(
    PolynomialFeatures(), # transformer
    LinearRegression() # predictor
)



In [56]:

    
model.fit(X, y)









    Out[56]:





Pipeline(steps=[('polynomialfeatures', PolynomialFeatures(degree=2, include_bias=True, interaction_only=False)), ('linearregression', LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False))])



In [57]:

    
model.score(X, y)









    Out[57]:





0.59243942644119341

Union

A union is a transformer that is initialized with an arbitrary number of transformers. When fit_transform is called on a dataset, it simply calls fit_transform of the transformers it was given and horizontally concatenates its results.



In [60]:

    
mu_ = np.linspace(0, 10, 30).reshape(3, 10)



In [61]:

    
model = \
make_union(
    PolynomialFeatures(),
    RadialFeatures(mu_)
)

If we run this on the original 442x10 dataset, we expect to get a dataset with the same number of samples and $\begin{pmatrix} 12 \\ 2 \end{pmatrix} + 3 = 66 + 3 = 69$ features.



In [62]:

    
model.fit_transform(X).shape









    Out[62]:





(442, 69)

Putting it all together

The above union applies the basis functions on the entire dataset, but we're interested in applying different basis functions to different features. To do this, we can simply define a rather frivolous transformer that simply slices the input data, and that's exactly what ArraySlicer was for.



In [67]:

    
model = \
make_pipeline(
    ArraySlicer(np.index_exp[:, np.array([2, 5])]),
    PolynomialFeatures()
)



In [68]:

    
model.fit(X)









    Out[68]:





Pipeline(steps=[('arrayslicer', ArraySlicer(index_exp=(slice(None, None, None), array([2, 5])))), ('polynomialfeatures', PolynomialFeatures(degree=2, include_bias=True, interaction_only=False))])



In [69]:

    
model.transform(X).shape









    Out[69]:





(442, 6)



In [70]:

    
# sanity check
model.transform(X)[:5].round(2)









    Out[70]:





array([[ 1.  ,  0.06, -0.03,  0.  , -0.  ,  0.  ],
       [ 1.  , -0.05, -0.02,  0.  ,  0.  ,  0.  ],
       [ 1.  ,  0.04, -0.03,  0.  , -0.  ,  0.  ],
       [ 1.  , -0.01,  0.02,  0.  , -0.  ,  0.  ],
       [ 1.  , -0.04,  0.02,  0.  , -0.  ,  0.  ]])

Then we can combine this all together to form our mega-transformer which we showed earlier.



In [72]:

    
model = \
make_pipeline(
    make_union(
        make_pipeline(
            ArraySlicer(np.index_exp[:, np.array([2, 5])]),
            PolynomialFeatures()
        ),
        make_pipeline(
            ArraySlicer(np.index_exp[:, np.array([6, 8, 9])]),
            RadialFeatures(mu)
        )
    ),
    LinearRegression()
)

This gives us a predictor which takes some input, slices up the respective features, churns it through a basis function and finally trains a linear regressor on it, all in one go!



In [73]:

    
model.fit(X, y)









    Out[73]:





Pipeline(steps=[('featureunion', FeatureUnion(n_jobs=1,
       transformer_list=[('pipeline-1', Pipeline(steps=[('arrayslicer', ArraySlicer(index_exp=(slice(None, None, None), array([2, 5])))), ('polynomialfeatures', PolynomialFeatures(degree=2, include_bias=True, interaction_only=False))])), ('pipeline-2', ... ('linearregression', LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False))])



In [74]:

    
model.score(X, y)









    Out[74]:





0.41693404706575754

Inter

Propagating Variable and Keyword arguments in a pipeline



In [ ]: